Skip to content

Conversation

@MSHelm
Copy link

@MSHelm MSHelm commented Dec 10, 2025

Fixes #348

The parser now tries to use the filename stored in the tiff metadata, if it exists. This is more robust, as users typically do not modify these entries.
If this metadata doesnt exist (which is the case for older data), it falls back to the actual filename.

Since the lung dataset used for testing is using an old format, I added a newer data set that is publicy available (https://zenodo.org/records/14008816). I documented this in the tests as well.

Full disclosure: I am working for Miltenyi, but I love your work ;)

@MSHelm MSHelm changed the title Parse cycle from ame in tiff metadata, not actual filename Parse macsima cycle from name in tiff metadata, not actual filename Dec 10, 2025
@codecov-commenter
Copy link

codecov-commenter commented Dec 10, 2025

Codecov Report

❌ Patch coverage is 11.89189% with 163 lines in your changes missing coverage. Please review.
✅ Project coverage is 52.70%. Comparing base (23bcb72) to head (904abfc).
⚠️ Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
src/spatialdata_io/readers/macsima.py 11.89% 163 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #349      +/-   ##
==========================================
- Coverage   55.07%   52.70%   -2.37%     
==========================================
  Files          26       26              
  Lines        2809     2975     +166     
==========================================
+ Hits         1547     1568      +21     
- Misses       1262     1407     +145     
Files with missing lines Coverage Δ
src/spatialdata_io/readers/macsima.py 20.82% <11.89%> (-7.76%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@MSHelm
Copy link
Author

MSHelm commented Dec 12, 2025

I checked the tests and the failing ones are all due to some Xenium and Visium HD readers, which I didnt touch XD Did they exist before this PR already?

@MSHelm
Copy link
Author

MSHelm commented Dec 16, 2025

Hey everyone,
with the last commits this turned into a bit of a bigger contribution. With this PR, we now solely rely on the metadata stored in the OME object. This should be significantly more robust than parsing file names, and enables us to add more metadata. In principle there would be even more information available than I currently implemented, but I think the most interesting bits are covered here.

I implemented 2 versions of the parser, as there are 2 different formats of the metadata. I removed the previously used data sets, as they are not part of the CI. Instead, I added 2 publicly available data sets, one for each version.

I tried modifying the github action to download them, but it seems that this was not really successful. If someone can give me some pointers that would be wonderful.

Cheers!

@MSHelm MSHelm changed the title Parse macsima cycle from name in tiff metadata, not actual filename Parse entire macsima metadata from OME metadata Dec 16, 2025
@LucaMarconato
Copy link
Member

Hi @MSHelm, thanks for the contribution! I’ve fixed the unrelated failing tests on main, and your PR is now green.

I had a quick look but before full review I kindly ask you to verify and confirm which items from the contribution guide checklist apply to this PR. Ensuring the contributions guidelines are followed help streamline our review process and improve long-term code maintainability. Much appreciated!

@MSHelm
Copy link
Author

MSHelm commented Jan 6, 2026

Hey @LucaMarconato

As requested below is the checklist.

Over christmas I also thought about whether we should be conservative or lenient with missing metadata?
The current implementation is a bit wonky, as the parsers themselves just create a dictionary that allows missing metadata. But the actual ChannelMetadata object only allows missing data for clones (for example for DAPI stainings). This can lead to unintuitive errors. Either I would change the ChannelMetadata object to allow missing metadata, except for name and cycle (as in the previous implementation), or be strict and directly return a ChannelMetadata object from the parsers.

Regarding the test data, I will provide some small versions of OMAP10 and OMAP23 via zenodo for the CI. Should then these be used for the unit tests?
It seems to me that the large original data is currently not fetched in the CI, and no tests on macsima are run due to this.

Checklist

  • Specification

    • Provide a link to the public specification/changelog of the raw data format (if no public specification available yet, reach out to scverse privately (Zulip/email)) -> See Zulip message
  • Reader implementation

    • Implement reader functions in src/spatialdata_io/readers/ (if no formal spec available, place under experimental/)
    • Place string constants in src/spatialdata_io/_constants/_constants.py
  • Example data

    • Provide (very) small public test dataset(s) (~100kB–10MB, preferred), licensed with a permissive license (e.g. CC BY 4.0)
    • Ensure dataset(s) cover edge cases of the format
    • In addition to the small dataset(s), consider creating scripts for downloading and converting a real dataset in spatialdata-sandbox:
    • If the data is not available online, upload it to the "SpatialData Submissions" Zenodo community
    • We do not support private data, with the exception of data that will soon be made available to the public. In such cases, please reach out to us. Then, please provide a private repo with download.py and to_zarr.py (or upload the data privately to the "SpatialData Submissions" Zenodo community)
  • Tests

    • Add test functions for the reader
    • Test multiple versions of the raw format if applicable
    • Verify visualization/alignment with spatialdata-plot or napari-spatialdata
    • Add proxy tests for spatial extent correctness (spatialdata.get_extent())
    • Add proxy tests for coloring/annotations of selected elements/features -> will do once the small data set is done and integrated
    • Add tests for auxiliary/helper functions if present
  • Extra

    • Update CLI in src/spatialdata_io/__main__.py if reader/converter is added/modified
    • Consider contributing to the public project board (currently view only, please reach out for edits)
    • Provide feedback on this contribution guide to improve it further

@LucaMarconato
Copy link
Member

Thank you for the update and for going through the checklist!

Regarding the test data, I will provide some small versions of OMAP10 and OMAP23 via zenodo for the CI. Should then these be used for the unit tests?
It seems to me that the large original data is currently not fetched in the CI, and no tests on macsima are run due to this.

Yes please, the small datasets (and only the small datasets) should be included in the CI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[MACSIMA] Parsing cycle number from filename doesnt work for renamed files

3 participants